Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][BPF] Report Unreachable Behavior from IR #126858

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

yonghong-song
Copy link
Contributor

@yonghong-song yonghong-song commented Feb 12, 2025

Marc Suñé (Isovalent, part of Cisco) reported an issue where an uninitialized variable caused generated bpf prog binary code not working as expected. The reproducer is in [1] where the flags “-Wall -Werror” are enabled, but there is no warning and compiler may take advantage of uninit variable to do aggressive optimization.

In discussion [2], various approaches are discussed, e.g., improving compiler to detect undefined behavior due to uninitialized variables, trying to use ubsan (-fsanitize=undefined), and making -ftrivial-auto-var-init=zero as the bpf default flags.

I tried [3] with -ftrivial-auto-var-init=zero and eventually we decided no-go since first it may introduce performance regression and second the prog may still be wrong if the prog expects a non-zero value. The ubsan apprach seems not working as well since it involves runtime callback func ([4]).

The approach here is not to do complicate compiler analysis to detect whether where is undef behavior which may impact final codegen. Rather, we relies on compiler to do its normal transformation and at later IR passes stage, a BPF backend pass is inserted to check whether undef behavior is in IR or not. Note that if undef behavior indeed impacts codes, the compiler will discard those related codes with simple 'undef' or 'unreachable'.

For example, for the case [1], before SCCPPass, the IR looks like

define dso_local i32 @repro(ptr noundef %0) #0 section "classifier" {
  %2 = alloca %struct.ipv6_opt_hdr, align 8
  %3 = tail call i64 (ptr, i32, ...) inttoptr (i64 6 to ptr)(ptr noundef nonnull @repro.____fmt, i32 noundef 6) #2
  %4 = tail call ptr asm sideeffect "$0 = *(u32 *)($1 + $2)", "=r,r,i"(ptr %0, i64 76) #2, !srcloc !3
  %5 = ptrtoint ptr %4 to i64
  %6 = trunc i64 %5 to i32
  %7 = tail call i64 (ptr, i32, ...) inttoptr (i64 6 to ptr)(ptr noundef nonnull @icmp6_ndisc_validate.____fmt, i32 noundef 23) #2
  call void @llvm.lifetime.start.p0(i64 2, ptr nonnull %2) #2
  %8 = getelementptr inbounds nuw i8, ptr %2, i64 1
  switch i8 undef, label %51 [
    i8 59, label %56
    i8 44, label %57
    i8 0, label %9
    i8 43, label %9
    i8 51, label %9
    i8 60, label %9
  ]

9:                                                ; preds = %1, %1, %1, %1
  %10 = sub i32 40, %6
  ...

Note that 'undef' is used for switch key due to one of early pass LoopFullUnrollPass. After SCCPPass:

efine dso_local i32 @repro(ptr noundef %0) #0 section "classifier" {
  %2 = alloca %struct.ipv6_opt_hdr, align 8
  %3 = tail call i64 (ptr, i32, ...) inttoptr (i64 6 to ptr)(ptr noundef nonnull @repro.____fmt, i32 noundef 6) #2
  %4 = tail call ptr asm sideeffect "$0 = *(u32 *)($1 + $2)", "=r,r,i"(ptr %0, i64 76) #2, !srcloc !3
  %5 = ptrtoint ptr %4 to i64
  %6 = trunc i64 %5 to i32
  %7 = tail call i64 (ptr, i32, ...) inttoptr (i64 6 to ptr)(ptr noundef nonnull @icmp6_ndisc_validate.____fmt, i32 noundef 23) #2
  call void @llvm.lifetime.start.p0(i64 2, ptr nonnull %2) #2
  %8 = getelementptr inbounds nuw i8, ptr %2, i64 1
  unreachable
}

Besides the above case, the following three patterns are also covered:

  • It is possible llvm may generate codes where a default branch to 'unreachable' location. Ignore such 'unreachable' instances. See [5] or some comments in [2].
  • Handle pattern like __bpf_unreachable (defined in bpf_helpers.h).
  • Functions with naked attribute will have 'unreachable' at the end of function. Ignore such functions.

A bpf flag -bpf-disable-check-unreachable-ir is introduced to disable this checking.

Tested with bpf selftests and there are no errors issued.

[1] https://github.com/msune/clang_bpf/blob/main/Makefile#L3
[2] https://discourse.llvm.org/t/detect-undefined-behavior-due-to-uninitialized-variables-in-bpf-programs/84116?u=yonghong-song
[3] #125601
[4] https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/ubsan/ubsan_interface.inc
[5] https://lore.kernel.org/lkml/[email protected]/

@yonghong-song
Copy link
Contributor Author

cc @anakryiko @jemarch

Copy link

github-actions bot commented Feb 12, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link

github-actions bot commented Feb 12, 2025

✅ With the latest revision this PR passed the undef deprecator.

Value *RetValue = I->getReturnValue();
// PoisonValue is a special UndefValue where compiler intentionally to
// poisons a value since it shouldn't be used.
if (!RetValue || isa<PoisonValue>(RetValue) || !isa<UndefValue>(RetValue))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like undef is deprecated.
Do we have to check it or PoisonValue is enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some extreme case, e.g.,

int foo(void) {
  int i[2];
  return i[1];
}

The eventual IR will look like

define dso_local i32 @foo() #0 {
  ret i32 undef
}

For such cases, checking !isa<PoisonValue>(RetValue) is not enough, we should check !isa<UndefValue>(RetValue) to ensure to report something wrong to user.

But such case should be really rare and it is easier for user to find out what is going on.

The prog Marc Suñé reported does not need the above check. Since upstream doesn't like to involve 'undef', I will remove the above checking and also remove the simple test which results in IR 'ret i32 undef' in the next revision.

@yonghong-song
Copy link
Contributor Author

clang-format still does not like 'undef' in the test. Will try not to have 'undef' in IR.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM/Clang generally do not allow the use of backend-generated warnings, because they produce unintelligible, optimization-dependent false-positives. There are some specific exceptions, but what you're doing here is basically exactly the kind of backend warning we want to avoid.

Does BPF have any kind of support for trapping or otherwise indicating an error? There is existing support to compile unreachable to a trap instruction (TrapUnreachable), but from a quick look, it doesn't seem like there is any BPF instruction this can be mapped to?

}

dbgs() << "WARNING: unreachable in func " << F.getName()
<< ", due to uninitialized variable?\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dbgs() is not a suitable method to report user-visible warnings. This needs to go through DiagnosticInfo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I just updated a change to use DiagnosticInfo.

@yonghong-song
Copy link
Contributor Author

LLVM/Clang generally do not allow the use of backend-generated warnings, because they produce unintelligible, optimization-dependent false-positives. There are some specific exceptions, but what you're doing here is basically exactly the kind of backend warning we want to avoid.

The backend warning here is BPF specific as several conditions are used to check before issuing the warning.

Does BPF have any kind of support for trapping or otherwise indicating an error? There is existing support to compile unreachable to a trap instruction (TrapUnreachable), but from a quick look, it doesn't seem like there is any BPF instruction this can be mapped to?

No, BPF arch does not have trapping insn or another other insn similar to trap/error.
BPF code is verified by the linux kernel. The verifier will do flow sensitive analysis to decide whether the prog is sane or not (memory corruption, out of bound, etc.).

I am not sure whether bpf should have trap insn or not. For a prog passing verifier, it should not trap in the kernel. Let us say we do introduce trap insn in BPF ISA, and llvm indeed inserts one which implies some path will hit 'trap' so probably we should error out at compile time. But it has slightly chance compiler may have a false positive, so I think a warning seems more appropriate.

@yonghong-song
Copy link
Contributor Author

In my latest change, I remove the selftest due to clang-format does not like 'undef' in the IR. If anybody wants to run it at llvm/test/CodeGen/BPF, the gist link is
https://gist.github.com/yonghong-song/11be8603ad9422f418f60b41beded047

@yonghong-song yonghong-song changed the title [BPF] Report Undefined Behavior from IR [BPF] Report Unreachable Behavior from IR Feb 21, 2025
@yonghong-song
Copy link
Contributor Author

Update a new revision with the following key changes:

  • By default, an error will be reported for a 'unreachable' IR if the BPF backend decides it is an error.
  • Add a flag -bpf-disable-check-unreachable-ir to allow users to disable this pass just in case for some false positives.

}

F.getContext().diagnose(
DiagnosticInfoGeneric(Twine("unreachable in func ")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that reporting file:line would still be useful in some cases.
If the error could say that in function FOO from line X to line Y that code was deleted as unreachable
that would help users to address the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, if debuginfo (-g) is available, the error message will be something like

error: in func repro from line 155 to the end of func that code was deleted as unreachable,
          due to uninitialized variable? try -Wuninitialized?

Without debuginfo, the error message will be like

error: in func repro that code was deleted as unreachable,
          due to uninitialized variable? try -Wuninitialized?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error: in func repro from line 155 to the end of func that code was deleted as unreachable,
due to uninitialized variable? try -Wuninitialized?

this looks better, maybe add '' around function name like we do elsewhere and don't shorten "function" to "func" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new message looks like:

error: in function "repro" from line 155 to the end of function that code was deleted as unreachable.
       due to uninitialized variable? try -Wuninitialized?

Yonghong Song added 4 commits February 24, 2025 08:41
Marc Suñé (Isovalent, part of Cisco) reported an issue where an
uninitialized variable caused generated bpf prog binary code not
working as expected. The reproducer is in [1] where the flags
“-Wall -Werror” are enabled, but there is no warning and compiler
may take advantage of uninit variable to do aggressive optimization.

In discussion [2], various approaches are discussed, e.g., improving
compiler to detect undefined behavior due to uninitialized variables,
trying to use ubsan (-fsanitize=undefined), and making
-ftrivial-auto-var-init=zero as the bpf default flags.

I tried [3] with -ftrivial-auto-var-init=zero and eventually we
decided no-go since first it may introduce performance regression
and second the prog may still be wrong if the prog expects a
non-zero value. The ubsan apprach seems not working as well
since it involves runtime callback func ([4]).

The approach here is not to do complicate compiler analysis to detect
whether where is undef behavior which may impact final codegen.
Rather, we relies on compiler to do its normal transformation
and at later IR passes stage, a BPF backend pass is inserted to
check whether undef behavior is in IR or not. Note that
if undef behavior indeed impacts codes, the compiler will discard
those related codes with simple 'undef' or 'unreachable'.

For example, for the case [1], before SCCPPass, the IR looks like
```
define dso_local i32 @repro(ptr noundef %0) #0 section "classifier" {
  %2 = alloca %struct.ipv6_opt_hdr, align 8
  %3 = tail call i64 (ptr, i32, ...) inttoptr (i64 6 to ptr)(ptr noundef nonnull @repro.____fmt, i32 noundef 6) llvm#2
  %4 = tail call ptr asm sideeffect "$0 = *(u32 *)($1 + $2)", "=r,r,i"(ptr %0, i64 76) llvm#2, !srcloc !3
  %5 = ptrtoint ptr %4 to i64
  %6 = trunc i64 %5 to i32
  %7 = tail call i64 (ptr, i32, ...) inttoptr (i64 6 to ptr)(ptr noundef nonnull @icmp6_ndisc_validate.____fmt, i32 noundef 23) llvm#2
  call void @llvm.lifetime.start.p0(i64 2, ptr nonnull %2) llvm#2
  %8 = getelementptr inbounds nuw i8, ptr %2, i64 1
  switch i8 undef, label %51 [
    i8 59, label %56
    i8 44, label %57
    i8 0, label %9
    i8 43, label %9
    i8 51, label %9
    i8 60, label %9
  ]

9:                                                ; preds = %1, %1, %1, %1
  %10 = sub i32 40, %6
  ...
```
Note that 'undef' is used for switch key due to one of early pass LoopFullUnrollPass.
After SCCPPass:
```
efine dso_local i32 @repro(ptr noundef %0) #0 section "classifier" {
  %2 = alloca %struct.ipv6_opt_hdr, align 8
  %3 = tail call i64 (ptr, i32, ...) inttoptr (i64 6 to ptr)(ptr noundef nonnull @repro.____fmt, i32 noundef 6) llvm#2
  %4 = tail call ptr asm sideeffect "$0 = *(u32 *)($1 + $2)", "=r,r,i"(ptr %0, i64 76) llvm#2, !srcloc !3
  %5 = ptrtoint ptr %4 to i64
  %6 = trunc i64 %5 to i32
  %7 = tail call i64 (ptr, i32, ...) inttoptr (i64 6 to ptr)(ptr noundef nonnull @icmp6_ndisc_validate.____fmt, i32 noundef 23) llvm#2
  call void @llvm.lifetime.start.p0(i64 2, ptr nonnull %2) llvm#2
  %8 = getelementptr inbounds nuw i8, ptr %2, i64 1
  unreachable
}
```

For another example,
```
$ cat t.c
int foo() {
  int i[2];
  return i[1];
}

```
Before SROAPass pass,
```
define dso_local i32 @foo() #0 {
  %1 = alloca [2 x i32], align 4
  call void @llvm.lifetime.start.p0(i64 8, ptr %1) llvm#2
  %2 = getelementptr inbounds [2 x i32], ptr %1, i64 0, i64 1
  %3 = load i32, ptr %2, align 4, !tbaa !3
  call void @llvm.lifetime.end.p0(i64 8, ptr %1) llvm#2
  ret i32 %3
}
```
After SROAPass pass,
```
define dso_local i32 @foo() #0 {
  ret i32 undef
}
```

Besides the above two test cases, the following three patterns are also covered:
  - It is possible llvm may generate codes where a default branch to 'unreachable' location.
    Ignore such 'unreachable' instances. See [5] or some comments in [2].
  - Handle pattern like __bpf_unreachable (defined in bpf_helpers.h).
  - Functions with naked attribute will have 'unreachable' at the end of function.
    Ignore such functions.

Tested with bpf selftests and there are no warnings issued.

  [1] https://github.com/msune/clang_bpf/blob/main/Makefile#L3
  [2] https://discourse.llvm.org/t/detect-undefined-behavior-due-to-uninitialized-variables-in-bpf-programs/84116?u=yonghong-song
  [3] llvm#125601
  [4] https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/ubsan/ubsan_interface.inc
  [5] https://lore.kernel.org/lkml/[email protected]/
Upstream does not like to check undef value and clang-format will
fail due to this. Let us remove checking for returning undef value.
A related test is also removed.
The test undef-sccp.ll still uses 'undef' in the IR and clang-format
complains it. In this particular case, 'undef' is generated in sroa
and it needs a lot of other passes to reach sccp. So let us remove
this test. The test itself can be accessed in
  https://gist.github.com/yonghong-song/11be8603ad9422f418f60b41beded047
which can be tested in llvm/test/CodeGen/BPF directory.

DiagnosticInfo() is the recommended interface for warnings comparing to
dbgs().
Report an error by default. Add a new flag -bpf-disable-check-unreachable-ir
to disable checking. Depend on whether debuginfo is available or not,
the error message will look like
  in func <func> from line <line num> to the end of func that code was deleted as unreachable,
     due to uninitialized variable? try -Wuninitialized?
or
  in func <func> that code was deleted as unreachable,
     due to uninitialized variable? try -Wuninitialized?
Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this is BPF-specific, it is introducing constraints on target-independent transforms: specifically, any transform which may introduce unreachable code is illegal.

As such, I think this needs an RFC to determine how to go forward. The needs to describe the general impact on transformations, and whether there are any existing transforms impacted by the constraint.

@yonghong-song yonghong-song changed the title [BPF] Report Unreachable Behavior from IR [RFC][BPF] Report Unreachable Behavior from IR Feb 24, 2025
@yonghong-song
Copy link
Contributor Author

Even though this is BPF-specific, it is introducing constraints on target-independent transforms: specifically, any transform which may introduce unreachable code is illegal.

As such, I think this needs an RFC to determine how to go forward. The needs to describe the general impact on transformations, and whether there are any existing transforms impacted by the constraint.

@efriedma-quic I marked this patch as RFC. I think the key thing is about when 'unreachable' is introduced today in middle end and what is the expected scenario 'unreachable' could be introduced in the future. For the 'unreachable' introduced in the future, I guess it should be okay. The BPF can be adjusted as necessary, similar to other transformations.

@efriedma-quic
Copy link
Collaborator

On a conventional target, in general, code with undefined behavior is fine, as long as it doesn't actually execute at runtime. People write source code based on this, and transforms duplicate code based on this. And this makes diagnostics in the backend impractical.

I understand that BPF operates under different constraints than conventional targets... in particular, BPF-C is a subset of C. It only allows programs that pass the BPF verifier. So maybe you don't run into exactly the same issues with late diagnostics. But, I'm not sure what the implications are here for transforms. For example, can JumpThreading introduce an "unreachable" that triggers the error here? If so, do we need to restrict JumpThreading on BPF?

@yonghong-song
Copy link
Contributor Author

On a conventional target, in general, code with undefined behavior is fine, as long as it doesn't actually execute at runtime. People write source code based on this, and transforms duplicate code based on this. And this makes diagnostics in the backend impractical.

I understand that BPF operates under different constraints than conventional targets... in particular, BPF-C is a subset of C. It only allows programs that pass the BPF verifier. So maybe you don't run into exactly the same issues with late diagnostics. But, I'm not sure what the implications are here for transforms. For example, can JumpThreading introduce an "unreachable" that triggers the error here? If so, do we need to restrict JumpThreading on BPF?

JumpThreading works for BPF target as well and the middle end optimization does not need to restrict JumpThreading for BPF target. The bpf backend pass implemented in this patch will handle such code patterns (BPF backend will do some analysis if necessary). If in the future, middle end optimization has new cases which introduce unreachable, BPF backend will try to handle handle such new patterns if necessary and this should be fine as they typically will be in the same release.

So in summary, middle end optimization does not need to do any special things for BPF backend.

@yonghong-song
Copy link
Contributor Author

cc @WenleiHe

@eddyz87
Copy link
Contributor

eddyz87 commented Feb 26, 2025

I was curious if there are other cases besides switch instruction, where unreachable could survive up to codegen. Unfortunately, unreachable is generated in a lot of places with complex logic, e.g. loop fusion or SCCP. It looks like it is impossible to know the answer for sure.

Given this, I tend to agree that a runtime trap would be a simpler solution. E.g. something along the lines libbpf does here, where it inserts a call to unknown helper function. In kernel verifier.c:get_helper_proto() is only called from check_helper_call() on main verification pass. Meaning that if unreachable is a dead code, verifier would delete it, otherwise it would report an error.

Verifier can be changed to report something about undefined behaviour trap to simplify debugging. BPF backend can be changed to avoid clobbering registers because of this special trap call.

Just my 5 cents.

@yonghong-song
Copy link
Contributor Author

I was curious if there are other cases besides switch instruction, where unreachable could survive up to codegen. Unfortunately, unreachable is generated in a lot of places with complex logic, e.g. loop fusion or SCCP. It looks like it is impossible to know the answer for sure.

Could you share the details about these experiments? I would like to see whether I missed anything which we should cover?

Given this, I tend to agree that a runtime trap would be a simpler solution. E.g. something along the lines libbpf does here, where it inserts a call to unknown helper function. In kernel verifier.c:get_helper_proto() is only called from check_helper_call() on main verification pass. Meaning that if unreachable is a dead code, verifier would delete it, otherwise it would report an error.

Regarding to runtime trap, I actually examined this as well before going to compiler approach. That approach is to change 'unreachable' to some newly created trap insn and later on when verifier reaches these trap insn, verification will fail. But I did a few examples and found that only limited patterns so I prefer to use the compiler approach. Note that the 'unreachable' checking is done at roughly the end of middle end optimizaiton (i.e. beginning of backend IR passes).

So let us gather more examples here before considering going to bpf verifier to deal with these 'unreachable' traps. Note that there is risk that these 'unreachable' traps may be rejected by verifier but actually it may be safe because of verifier some in-precise analysis. So the best way is to do at runtime, but it has its own complication.

Verifier can be changed to report something about undefined behaviour trap to simplify debugging. BPF backend can be changed to avoid clobbering registers because of this special trap call.

So my suggestion is to let us do some thorough analysis at compile time. Note that it is always better for compiler to flag something first. If eventually it turns out there are too many patterns which may have fake unreachables, then we could consider run-time approach then.

Just my 5 cents.

@eddyz87
Copy link
Contributor

eddyz87 commented Feb 27, 2025

I was curious if there are other cases besides switch instruction, where unreachable could survive up to codegen. Unfortunately, unreachable is generated in a lot of places with complex logic, e.g. loop fusion or SCCP. It looks like it is impossible to know the answer for sure.

Could you share the details about these experiments? I would like to see whether I missed anything which we should cover?

I did not do any experiments, tried to analyze places where unreachable is inserted and found logic to be too complex to try and produce any examples. E.g. lvm/lib/Transforms/Scalar/LoopFuse.cpp inserts unreachable in 6 places. W/o some understanding of loop fusion algorithm I can't produce an example.

One option would be to copy this transformation to x86 backend and e.g. compile Linux kernel or some other big project, to see which code patterns introduce unreachable in practice (if any). I can try it some time later.

@yonghong-song
Copy link
Contributor Author

I was curious if there are other cases besides switch instruction, where unreachable could survive up to codegen. Unfortunately, unreachable is generated in a lot of places with complex logic, e.g. loop fusion or SCCP. It looks like it is impossible to know the answer for sure.

Could you share the details about these experiments? I would like to see whether I missed anything which we should cover?

I did not do any experiments, tried to analyze places where unreachable is inserted and found logic to be too complex to try and produce any examples. E.g. lvm/lib/Transforms/Scalar/LoopFuse.cpp inserts unreachable in 6 places. W/o some understanding of loop fusion algorithm I can't produce an example.

One option would be to copy this transformation to x86 backend and e.g. compile Linux kernel or some other big project, to see which code patterns introduce unreachable in practice (if any). I can try it some time later.

Thanks. It would be great to collect some statistics. I only tried with bpf programs in kernel bpf selftests. Maybe trying with other architectures can expose more patterns.

@yonghong-song
Copy link
Contributor Author

I also discussed with @WenleiHe a little bit about this. Another solution is to extend 'unreachable' insn in LLVM. For example, we could have 'unreachable' like below
unreachable
The reason could include

  • due to switch statement where unreachable is truely unreachable
  • due to generating llvm.trap where any code after that is truely unreachable
  • maybe unreachable for naked function
  • unreachable due to code is optimized by taking advantage of undef behavior and it is actually reachable.

With carried and passed from various optimization, downstream can reason about 'unreachable' insn. This will make BPF backend pass easier to only trigger error for due to undef bahavior.

WDYT? Is this a reasonable approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants